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ABSTRACT 

The linear algorithm of the Wiener filter and constrained realizations (CRs) of Gaus¬ 
sian random fields is extended here to perform non-linear CRs. The procedure consists 
of: (1) Using linear CR of low resolution data to construct a high resolution underlying 
field, as if the linear theory is valid; (2) Taking the linear CR backwards in time, by the 
linear theory, to set initial conditions for an N-body simulation; (3) Forwarding the field 
in time by an N-body code. An intermediate step might be introduced to ‘linearize’ the 
low resolution data. 

The non-linear CR can be applied to any observational data set that is linearly related 
to the underlying field. Here it is applied to the IRAS 1.2Jy catalog using 843 data points 
within a sphere of 6000 Km/s, to reconstruct the full non-linear large scale structure of 
our ‘local’ universe. 
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I. Introduction 


In the standard model of cosmology galaxies and the large scale structure of the uni¬ 
verse form out of a random perturbation field via gravitational instability. It is assumed 
that the primordial perturbation field constitutes a random homogenous and isotropic 
Gaussian field and that on relevant scales its amplitude is small, hence its evolution is 
described by the linear theory of gravitational instability (c/. Peebles 1980). The the¬ 
oretical study of structure formation has been a major effort of modern cosmology (c/. 
Padmanabhan 1993). On the observational side, the large scale structure has been studied 
mostly by means of red-shift surveys (c/. Strauss and Willick 1995) and peculiar velocities 
(c/. Dekel 1994). A method for the reconstruction of the underlying dynamical (density 
and velocity) fields from a given observational data base is presented here. 

The problem of recovering the underlying field from given observations, which by their 
nature are incomplete and have a finite accuracy and resolution, is one often encountered in 
many branches of physics and astronomy. It has been shown that for a random Gaussian 
field an optimal estimator of the underlying field is given by a minimal variance solu¬ 
tion (Zaroubi, Hoffman, Fisher and Lahav 1996; ZHFL), known also as the Wiener filter 
(hereafter WF, Wiener 1949, Press et al. 1986). This approach is based on the a priori 
knowledge of the statistical nature of the field, the so-called prior. Within the framework 
of Gaussian fields the WF coincides with the Bayesian posterior and the maximum entropy 
estimations (ZHFL). Indeed, in the cosmological case on large enough scales where linear 
theory applies and the (over) density and velocity fields are Gaussian the WF is the opti¬ 
mal tool for the reconstruction of the large scale structure. This is further complemented 
by the algorithm of constrained realizations (CRs) of Gaussian fields (Hoffman and Ribak 
1991) to create Monte Carlo simulations of the residual from this optimal estimation. This 
combined WF/CR approach has been applied recently to a variety of cosmological data 
bases in an effort to reconstruct the large scale structure. This includes the analysis of the 
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COBE/DMR data (Bunn et al. 1994), the analysis of the velocity potential (Ganon and 
Hoffman 1993), the reconstruction of the density field (Hoffman 1993, 1994, Lahav 1993, 
1994, Lahav et al. 1994) and the peculiar velocity field (Fisher et al. 1995) from the 
IRAS redshift survey (Fisher et al. 1993). 

A major limitation of the WF/CR approach is that it applies only in the linear regime. 
Yet, on small scales the perturbations are not small and the full non-linear gravitational 
instability theory has to be used. Here the WF/CR method is extended to the non-linear 
regime, and a new algorithm of non-linear constrained realizations (NLCRs) is presented. 
The general method is presented in §11 and its application to the IRAS 1.2Jy catalog is 
given in §111. The results are presented in §IV and a short discussion (§V) concludes this 
Letter. 


II. Non Linear Constrained Realizations 

The general WF/CR method has been fully described in ZHFL and only a very short 
outline of it is presented here. Consider the case of a set of observations performed on 
an underlying random field (with N degrees of freedom) s = {si,...,sjv} yielding M 
data points, d = {di, ...,cIm}- Here, only measurements that can be modeled as linear 
convolution or mapping on the field are considered. The act of observation is represented 
by 

d = Rs + e, (1) 

where R is a linear operator which represents the point spread function and e = {e,,..., cm} 
gives the statistical errors. Here the notion of a point spread function is extended to 
include any linear operation that relates the measurements to the underlying field. The 
WF estimator is: 

s WF = (sdt)(ddt) _1 d . (2) 

Here, /...^) represents an ensemble average and ^sd^ is the cross-correlation matrix of 
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the data and the underlying field. The data auto correlation matrix is 

= R^ss^RT ^ee^, (3) 

where the second term represents the statistical errors, i.e. shot noise. (See ZHFL for a 
general treatment of the error covariance matrix.) 

In the case of a random Gaussian field, the WF estimator coincides with the condi¬ 
tional mean field given the data. A CR of the random residual from the mean is obtained 
by creating an unconstrained realization of the underlying field (s) and the errors (e), and 
‘observing’ it the same way the actual universe is observed. Namely, a mock data base is 
created by: 

d = Rs + e, (4) 

A CR is then obtained simply by (Hoffman and Ribak 1991): 

s CR = s + ^sd^dd*|) (d — d). (5) 

The WF/CR is used now to uncover the finite resolution used to obtain the data. 
Thus, low resolution data is used to make high resolution CRs. Here, for simplicity the 
finite resolution is modeled by Gaussian smoothing, however any other kernel can be used 
as well. The low resolution is modeled by smoothing on scale Rl and the high resolution 
by Rs, where Rl > Rs- Often, the low resolution data correspond to obseravables in, or 
close to, the linear regime, while the high resolution field lies deep in the non-linear regime. 
However, the reconstruction is done within the framework of the linear theory. This might 
be a reasonable assumption for the low resolution data, but it is certainly inconsistent with 
the high resolution field. The following procedure is suggested here for the NLCR: I. Use 
low resolution data to construct a high resolution CR, as if linear theory holds on these 
small scales; II. Take this CR backwards to an early enough epoch by the linear theory; 
III. Use this CR, which now constitutes a given realization of the initial conditions of our 
‘local’ universe constrained by observational data, as an input to an N-body code to evolve 
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it to the present time. These three steps provides one with a NLCR given the observed 
data and the assumed prior model. 

The three steps proposed here are all consistent with the general framework of the 
standard model, namely Gaussian primordial perturbation field and gravitational insta¬ 
bility. Now, in the case where the smoothing indeed transforms dynamical variables to 
the linear regime, the constructed field provides one with a particular realization which is 
fully consistent with the assumed model. The quality of the reconstruction depends on the 
accuracy of the data and its sampling and on the nature of the prior , namely the ‘strength’ 
of the correlations. However, often it happens that smoothing does not take the data all 
the way to the linear regime. In such a case an intermediate step of mapping these quasi- 
linear variables to the linear ones has to be introduced. Such a mapping cannot be usually 
rigorously formulated and one should recourse to some approximations. These should be 
checked against N-body simulations and mock catalogs, to find a mapping suitable to the 
problem at hand. 


III. Application: The IRAS 1.2Jy Catalog 

The WF/CR and the NLCR presented here can be used with any data base whose 
relation to the underlying field can be modeled by Eq. 1. Thus, observations of the velocity 
field can be used to reconstruct the density field and vice versa. The concrete case studied 
here is the construction of the density and velocity fields from the IRAS 1.2Jy redshift 
survey. At present red-shift distortions are ignored, however the formalism can be easily 
extended to account for these as well (Zaroubi and Hoffman 1995). The sample is defined 
by its selection function, <j>(r), and the boundaries of the survey are defined by a mask of 
galactic latitude |6| < 5°. The prior assumed here is a CDM power spectrum with a shape 
parameter T = 0.2 and a normalization of as = 0.7 (c/. Strauss and Willick 1995). For 
simplicity no biasing and a flat universe are assumed. 

The underlying density field is evaluated on a Cartesian grid with a sampling rate 
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of 1000 Km/s (here distances are given in velocity units) within a sphere of 6000/im/s, 
excluding IRAS’ zone of avoidance. The discrete galaxy distribution is smoothed on a 
scale Rl = 1000 Km/s. This yields M = 834 data points: 


A a = A (r „)=[X:^_ e xp ( 


( r a ~ r ga l) ; 

2 Rl 


) -n / 


in 


( 6 ) 


where n is the mean number density of the IRAS galaxies. The data autocorrelation 
function is written as ^A a A + a a p. The first term is just the autocorrelation 
function of the smoothed field ( £ s (r) ), 


fa /3 = f s (|r a - 1731) = 


(2tt) 


P(k) exp (-(k,R L ) 2 ) exp(ik • (r Q - rp))d 3 k, (7) 


and the shot noise covariance matrix is: 


a, 


a/3 h(27ri?|)3/2 J 0( x ) 


exp 


(r a -x) 2 + (173 -x) 2 ^ 3 


2 Rl 


x 


( 8 ) 


Note that the kernel introduces off-diagonal terms in the error covariance matrix (ZHFL). 
The cross-correlation of the high resolution field and the low resolution data is: 


fate) = 


(2tt) 


P(k)ex p(- ^’ jRs - > + ( kRL ) ) exp (ik • (r, -r a ))d 3 fc (9) 


Defining the WF operator W z p, 


f^t/3 ^fa/3 T <7a/3^ 5 


( 10 ) 


and a linear high resolution is thus obtained by 


5(r i ) = 5(r i ) + ^(A j g-A j g). (11) 

The choice of Rl plays a crucial role in the NLCR algorithm and it involves conflicting 
considerations. On the one hand a small Rl is desired, so as to keep high resolution 
information, but on the other one a large Rl would minimize the shot noise errors and 
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would result in a ‘more’ linear estimator. Here a value of Rl = 1000 Km/s is chosen. 
To check the linearization of this smoothing a non-linear unconstrained realization of the 
assumed prior has been calculated by a PM N-body code. Now, the smoothed (scale Rl) 
(over)density is evaluated in two ways. One is done by smoothing the initial conditions 
and propagating it in time by the linear theory, yielding 5 L . The other, S NL , is obtained 
by smoothing the full non-linear density held. It is known that even on the 1000 Km/s 
smoothing scale there are systematic deviation and scatter from the desired 5 NL = 5 L 
relation (Nusser et al. 1991). One finds that on both ends of high amplitudes positive 
and negative Vs, 5 NL is larger than S L . Note that this is a pure dynamical phenomenon 
and the statistical shot noise does not affect it. A minimal variance fitting formula is 
calculated here, 5 L — f(S NL )S NL , and this is used to recover the linear held. Note that 
f(S NL ) depends on the assumed model and the smoothing kernel. A consistency check on 
this simple mapping is the evaluation of the 1-point distribution function of f(5 NL )5 NL . 
Indeed it is found to be very close to that of 5 L , namely a normal distribution. 

Various algorithms have been proposed to trace back non-linear perturbation held to 
the linear regime (c/. Strauss and Willick 1995). All of these ‘time machines’ recover 
the initial linear held in the case where the quasi-linear held is known exactly, with no 
statistical uncertainty. The case of real observational data where the shot noise increases 
with distance, poses a much more difficult problem. As one goes further away the data 
becomes more dominant by the shot noise and in the mean the amplitude of the measured 
held increases with distance. Thus, before applying any ‘time machine’ the signal has to 
be first cleaned from the noise and only then it can be traced back to the linear regime. 

The phenomenological hx to the ‘non-linearity’ of the smoothed data which is used 
here consists of two steps. First, to account for the scatter in the ( S NL ,S L ) relation a 
new term is introduced to the data auto-covariance matrix, a NL . Dealing with the scatter 
by statistical means is a manifestation of our inability to invert the exact non-local non¬ 
linear mapping from the linear to the quasi-linear regime. Here we go to the extreme 
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simplification and take a^ = const.5 a p. The value of the constant term is determine by 
the requirement that y 2 /<i.o./. = 1 , where the y 2 takes into account the cosmic variance, 
shot noise and a NL . A WF estimator of the iiT-smoothed field is obtained by applying a 
WF on the data, where Rs is replaced by Rl to obtain low resolution, 


A*. (12) 

Rs=Rl 

The estimation of the quasi-linear correction is given by (l — /(c) WF ’Q L )(f WF ’Q L . This 
correction is evaluated at grid points r Q and is used to correct the data points: 



= A a - (1 - f(6 w f,Ql^wf,ql_ ( 13 ) 

The modified (‘linearized’) A^’s are now substituted in Eq. 10 to obtain a high resolution 
CR of the underlying linear field, given the actual data. 

The linearization procedure presented here behaves as follows. In the limit of distant 
data points, where the data is dominated by shot noise, the WF attenuates the estimated 
field towards zero amplitude. Substituting the resulting estimator in Eq. 12 would hardly 
change its value. The WF/CR is therefore dominated by the random residual, and con¬ 
sequently the resulting realization lies in the linear regime. For nearby data points where 
the shot noise is negligible, the WF leaves the signal almost untouched ^ WF ’Q L « A. In 
such a case the fitting formula would linearized the data, as has been checked against the 
N-body simulations. 


IV. Volume Limited IRAS Catalog 

The IRAS 1.2Jy catalog consists of 5321 galaxies These are used to evaluate the 
smoothed density field on a Cartesian grid of 1000 Km/s spacing within a 6000/im/s, 
excluding the zone of avoidance, yielding 844 data points. NLCRs are created on a finer 
64 3 grid of 250 Km/s spacing. A PM N-body code that is used here is based on an FFT 
Poisson solver. For a CDM-like power spectrum the structure within the 6000/im/s would 



be hardly affected by the periodicity on the ±8000/im/s box. A comprehensive analysis 
of NLCR, including detailed comparison of reconstruction of mock catalogs and sensitivity 
to the assumed prior has been conducted and will be presented in a forthcoming paper 
(Bistolas and Hoffman, 1995a) 

The IRAS galaxy distribution is presented in Fig. 1, where the projected galaxy 
distribution (within ±1000 Km/s) on the nine planes of SGX , SGY, SGZ = ±3000, 0 Km/s 
is given. The full N-body distribution of the NLCR is presented in Fig. 2 in a manner 
similar to that of Fig. 1. Note that this consists a realization of a ‘volume limited’ IRAS 
catalog. Finally, the non-linear reconstructed field at 500 Km/s smoothing is presented 
in Fig. 3. A full analysis of the cosmography revealed by the NLCR will be given in a 
forthcoming publication (Bistolas and Hoffman 1995b). Here we just point to the seemingly 
filamentry structure of the reconstruced galaxy distribution. A closer inspection shows that 
the generic feature here is more planar (2D) rather than a filamentry (ID) structure, and 
the apparent filaments are the intersection of the sheets with the planes defined by the 
plots. Different NLCRs have been performed to study the variance implied by the prior 
and the data and relatively small scatter is found between the different realizations. In 
particular, the existence and location of peaks and troughs is a very robust feature of the 
realization with small scatter in their amplitudes. Also the sheets and filaments remain 
invariant under the different realizations, however their ‘sharpness’ varies somewhat. A 
comparison of two such realizations is given in Fig. 4, where the ‘volume limited’ galaxy 
distribution and 500 Km/s smoothed 5-field at the supergalactic (SGZ = 0) plane are 
plotted. 


V. Discussion 

The NLCR algorithm presented here enables one to perform controlled Monte Carlo 
N-body simulations of the formation of our ‘local’ universe. These are designed to recover 
the actual observational data, used to constrain them, within the statistical uncertainties 
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of the data. The new ingredient introduced here is the reconstruction of the non-linear 
regime, i.e. the extrapolation in Fourier space from small to large wavenumbers that are 
deep in the non-linear regime. 

The NLCR introduced here can serve as a tool for studying and analyzing the large 
scale structure of the universe. Some of the obvious problems where NLCRs are expected 
to be very useful are: (1) The reconstruction of the velocity field from redshift catalogs; 
(2) Mapping the zone of avoidance and extrapolating the dynamical fields into unobserved 
regions; (3) Studying the dynamics of actually observed rich clusters with the actual initial 
and boundary conditions; (4) Analysis of filaments and pancakes as probes of the initial 
conditions and the cosmological model; (5) The NLCR can serve as a probe of the biasing 
mechanism. The main virtue here lies in the fact that different data sets, which in principal 
can represent different biasing of the underlying dynamical field, can be used to simulta¬ 
neously set constraints on the realizations. Given all these and the technical simplicity of 
the algorithm we expect it to be a standard tool of N-body and gas dynamical simulations. 

At the time this Letter has been written Kolatt et al. (1995) have reported on a similar 
project of NLCR of the IRAS 1.2Jy catalog. Their procedure differs from the present 
one mainly in not distinguishing between the low resolution (data) and high resolution 
(realizations). The input data is smoothed on the 500 Km/s scale and is heavily dominated 
by the noise, which is ‘removed’ by a power preserving modified WF. The modified filter 
is designed to preserve the power, regardless of the noise level. The resulting estimator 
is therefore more dominated by the noise and less by the prior model compared to our 
method. Yet, both methods seem to yield similar results and are equally efficient. Detailed 
comparisons against N-body simulations are needed to judge the merits of each method. 
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Figure Captions 


Fig. 1: Raw data: The IRAS 1.2Jy galaxies. The galaxy distribution is presented in 
9 planar slabs of thickness of ±10 h~ 1 Mpc. (Supergalactic coordinates are used and 
distances are given in h~ 1 Mpc , where h is Hubble’s constant in units of 100 Km/s/Mpc.) 

Fig. 2. ‘Volume limited’ IRAS catalog: A non-linear constrained realization based on the 
IRAS galaxy distribution. The full N-body particle distribution has been diluted to the 
mean IRAS galaxies mean number density. 

Fig. 3. Gaussian smoothing: The non-linear constrained realization shown in Fig. 2 is 
Gaussian smoothed on a scale R = 500 Km/s. Contour spacing is 0.2 and the dashed 
lines correspond to negative values of 5. 

Fig. 4. Different realizations: A comparison of different non-linear constrained realizations 
of the same input data and prior is presented here by the ‘galaxy’ distribution and the 
contour plots. Upper and lower raws correspond to two different realizations. The galaxy 
distribution and contour plots are presented in the same way as in Figs. 2 and 3. 
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I. Introduction 


In the standard model of cosmology galaxies and the large scale structure of the uni¬ 
verse form out of a random perturbation field via gravitational instability. It is assumed 
that the primordial perturbation field constitutes a random homogenous and isotropic 
Gaussian field and that on relevant scales its amplitude is small, hence its evolution is 
described by the linear theory of gravitational instability (c/. Peebles 1980). The the¬ 
oretical study of structure formation has been a major effort of modern cosmology (c/. 
Padmanabhan 1993). On the observational side, the large scale structure has been studied 
mostly by means of red-shift surveys (c/. Strauss and Willick 1995) and peculiar velocities 
(c/. Dekel 1994). A method for the reconstruction of the underlying dynamical (density 
and velocity) fields from a given observational data base is presented here. 

The problem of recovering the underlying field from given observations, which by their 
nature are incomplete and have a finite accuracy and resolution, is one often encountered in 
many branches of physics and astronomy. It has been shown that for a random Gaussian 
field an optimal estimator of the underlying field is given by a minimal variance solu¬ 
tion (Zaroubi, Hoffman, Fisher and Lahav 1996; ZHFL), known also as the Wiener filter 
(hereafter WF, Wiener 1949, Press et al. 1986). This approach is based on the a priori 
knowledge of the statistical nature of the field, the so-called prior. Within the framework 
of Gaussian fields the WF coincides with the Bayesian posterior and the maximum entropy 
estimations (ZHFL). Indeed, in the cosmological case on large enough scales where linear 
theory applies and the (over) density and velocity fields are Gaussian the WF is the opti¬ 
mal tool for the reconstruction of the large scale structure. This is further complemented 
by the algorithm of constrained realizations (CRs) of Gaussian fields (Hoffman and Ribak 
1991) to create Monte Carlo simulations of the residual from this optimal estimation. This 
combined WF/CR approach has been applied recently to a variety of cosmological data 
bases in an effort to reconstruct the large scale structure. This includes the analysis of the 
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COBE/DMR data (Bunn et al. 1994), the analysis of the velocity potential (Ganon and 
Hoffman 1993), the reconstruction of the density field (Hoffman 1993, 1994, Lahav 1993, 
1994, Lahav et al. 1994) and the peculiar velocity field (Fisher et al. 1995) from the 
IRAS redshift survey (Fisher et al. 1993). 

A major limitation of the WF/CR approach is that it applies only in the linear regime. 
Yet, on small scales the perturbations are not small and the full non-linear gravitational 
instability theory has to be used. Here the WF/CR method is extended to the non-linear 
regime, and a new algorithm of non-linear constrained realizations (NLCRs) is presented. 
The general method is presented in §11 and its application to the IRAS 1.2Jy catalog is 
given in §111. The results are presented in §IV and a short discussion (§V) concludes this 
Letter. 


II. Non Linear Constrained Realizations 

The general WF/CR method has been fully described in ZHFL and only a very short 
outline of it is presented here. Consider the case of a set of observations performed on 
an underlying random field (with N degrees of freedom) s = {si,...,sjv} yielding M 
data points, d = {di, ...,cIm}- Here, only measurements that can be modeled as linear 
convolution or mapping on the field are considered. The act of observation is represented 
by 

d = Rs + e, (1) 

where R is a linear operator which represents the point spread function and e = {e,,..., cm} 
gives the statistical errors. Here the notion of a point spread function is extended to 
include any linear operation that relates the measurements to the underlying field. The 
WF estimator is: 

s WF = (sdt)(ddt) _1 d . (2) 

Here, /...^) represents an ensemble average and ^sd^ is the cross-correlation matrix of 
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the data and the underlying field. The data auto correlation matrix is 

= R^ss^RT ^ee^, (3) 

where the second term represents the statistical errors, i.e. shot noise. (See ZHFL for a 
general treatment of the error covariance matrix.) 

In the case of a random Gaussian field, the WF estimator coincides with the condi¬ 
tional mean field given the data. A CR of the random residual from the mean is obtained 
by creating an unconstrained realization of the underlying field (s) and the errors (e), and 
‘observing’ it the same way the actual universe is observed. Namely, a mock data base is 
created by: 

d = Rs + e, (4) 

A CR is then obtained simply by (Hoffman and Ribak 1991): 

s CR = s + ^sd^dd*|) (d — d). (5) 

The WF/CR is used now to uncover the finite resolution used to obtain the data. 
Thus, low resolution data is used to make high resolution CRs. Here, for simplicity the 
finite resolution is modeled by Gaussian smoothing, however any other kernel can be used 
as well. The low resolution is modeled by smoothing on scale Rl and the high resolution 
by Rs, where Rl > Rs- Often, the low resolution data correspond to obseravables in, or 
close to, the linear regime, while the high resolution field lies deep in the non-linear regime. 
However, the reconstruction is done within the framework of the linear theory. This might 
be a reasonable assumption for the low resolution data, but it is certainly inconsistent with 
the high resolution field. The following procedure is suggested here for the NLCR: I. Use 
low resolution data to construct a high resolution CR, as if linear theory holds on these 
small scales; II. Take this CR backwards to an early enough epoch by the linear theory; 
III. Use this CR, which now constitutes a given realization of the initial conditions of our 
‘local’ universe constrained by observational data, as an input to an N-body code to evolve 
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it to the present time. These three steps provides one with a NLCR given the observed 
data and the assumed prior model. 

The three steps proposed here are all consistent with the general framework of the 
standard model, namely Gaussian primordial perturbation field and gravitational insta¬ 
bility. Now, in the case where the smoothing indeed transforms dynamical variables to 
the linear regime, the constructed field provides one with a particular realization which is 
fully consistent with the assumed model. The quality of the reconstruction depends on the 
accuracy of the data and its sampling and on the nature of the prior , namely the ‘strength’ 
of the correlations. However, often it happens that smoothing does not take the data all 
the way to the linear regime. In such a case an intermediate step of mapping these quasi- 
linear variables to the linear ones has to be introduced. Such a mapping cannot be usually 
rigorously formulated and one should recourse to some approximations. These should be 
checked against N-body simulations and mock catalogs, to find a mapping suitable to the 
problem at hand. 


III. Application: The IRAS 1.2Jy Catalog 

The WF/CR and the NLCR presented here can be used with any data base whose 
relation to the underlying field can be modeled by Eq. 1. Thus, observations of the velocity 
field can be used to reconstruct the density field and vice versa. The concrete case studied 
here is the construction of the density and velocity fields from the IRAS 1.2Jy redshift 
survey. At present red-shift distortions are ignored, however the formalism can be easily 
extended to account for these as well (Zaroubi and Hoffman 1995). The sample is defined 
by its selection function, <j>(r), and the boundaries of the survey are defined by a mask of 
galactic latitude |6| < 5°. The prior assumed here is a CDM power spectrum with a shape 
parameter T = 0.2 and a normalization of as = 0.7 (c/. Strauss and Willick 1995). For 
simplicity no biasing and a flat universe are assumed. 

The underlying density field is evaluated on a Cartesian grid with a sampling rate 
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of 1000 Km/s (here distances are given in velocity units) within a sphere of 6000/im/s, 
excluding IRAS’ zone of avoidance. The discrete galaxy distribution is smoothed on a 
scale Rl = 1000 Km/s. This yields M = 834 data points: 


A a = A (r „)=[X:^_ e xp ( 


( r a ~ r ga l) ; 

2 Rl 


) -n / 


in 


( 6 ) 


where n is the mean number density of the IRAS galaxies. The data autocorrelation 
function is written as ^A a A + a a p. The first term is just the autocorrelation 
function of the smoothed field ( £ s (r) ), 


fa /3 = f s (|r a - 1731) = 


(2tt) 


P(k) exp (-(k,R L ) 2 ) exp(ik • (r Q - rp))d 3 k, (7) 


and the shot noise covariance matrix is: 


a, 


a/3 h(27ri?|)3/2 J 0( x ) 


exp 


(r a -x) 2 + (173 -x) 2 ^ 3 


2 Rl 


x 


( 8 ) 


Note that the kernel introduces off-diagonal terms in the error covariance matrix (ZHFL). 
The cross-correlation of the high resolution field and the low resolution data is: 


fate) = 


(2tt) 


P(k)ex p(- ^’ jRs - > + ( kRL ) ) exp (ik • (r, -r a ))d 3 fc (9) 


Defining the WF operator W z p, 


f^t/3 ^fa/3 T <7a/3^ 5 


( 10 ) 


and a linear high resolution is thus obtained by 


5(r i ) = 5(r i ) + ^(A j g-A j g). (11) 

The choice of Rl plays a crucial role in the NLCR algorithm and it involves conflicting 
considerations. On the one hand a small Rl is desired, so as to keep high resolution 
information, but on the other one a large Rl would minimize the shot noise errors and 
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would result in a ‘more’ linear estimator. Here a value of Rl = 1000 Km/s is chosen. 
To check the linearization of this smoothing a non-linear unconstrained realization of the 
assumed prior has been calculated by a PM N-body code. Now, the smoothed (scale Rl) 
(over)density is evaluated in two ways. One is done by smoothing the initial conditions 
and propagating it in time by the linear theory, yielding 5 L . The other, S NL , is obtained 
by smoothing the full non-linear density held. It is known that even on the 1000 Km/s 
smoothing scale there are systematic deviation and scatter from the desired 5 NL = 5 L 
relation (Nusser et al. 1991). One finds that on both ends of high amplitudes positive 
and negative Vs, 5 NL is larger than S L . Note that this is a pure dynamical phenomenon 
and the statistical shot noise does not affect it. A minimal variance fitting formula is 
calculated here, 5 L — f(S NL )S NL , and this is used to recover the linear held. Note that 
f(S NL ) depends on the assumed model and the smoothing kernel. A consistency check on 
this simple mapping is the evaluation of the 1-point distribution function of f(5 NL )5 NL . 
Indeed it is found to be very close to that of 5 L , namely a normal distribution. 

Various algorithms have been proposed to trace back non-linear perturbation held to 
the linear regime (c/. Strauss and Willick 1995). All of these ‘time machines’ recover 
the initial linear held in the case where the quasi-linear held is known exactly, with no 
statistical uncertainty. The case of real observational data where the shot noise increases 
with distance, poses a much more difficult problem. As one goes further away the data 
becomes more dominant by the shot noise and in the mean the amplitude of the measured 
held increases with distance. Thus, before applying any ‘time machine’ the signal has to 
be first cleaned from the noise and only then it can be traced back to the linear regime. 

The phenomenological hx to the ‘non-linearity’ of the smoothed data which is used 
here consists of two steps. First, to account for the scatter in the ( S NL ,S L ) relation a 
new term is introduced to the data auto-covariance matrix, a NL . Dealing with the scatter 
by statistical means is a manifestation of our inability to invert the exact non-local non¬ 
linear mapping from the linear to the quasi-linear regime. Here we go to the extreme 
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simplification and take a^ = const.5 a p. The value of the constant term is determine by 
the requirement that y 2 /<i.o./. = 1 , where the y 2 takes into account the cosmic variance, 
shot noise and a NL . A WF estimator of the iiT-smoothed field is obtained by applying a 
WF on the data, where Rs is replaced by Rl to obtain low resolution, 


A*. (12) 

Rs=Rl 

The estimation of the quasi-linear correction is given by (l — /(c) WF ’Q L )(f WF ’Q L . This 
correction is evaluated at grid points r Q and is used to correct the data points: 



= A a - (1 - f(6 w f,Ql^wf,ql_ ( 13 ) 

The modified (‘linearized’) A^’s are now substituted in Eq. 10 to obtain a high resolution 
CR of the underlying linear field, given the actual data. 

The linearization procedure presented here behaves as follows. In the limit of distant 
data points, where the data is dominated by shot noise, the WF attenuates the estimated 
field towards zero amplitude. Substituting the resulting estimator in Eq. 12 would hardly 
change its value. The WF/CR is therefore dominated by the random residual, and con¬ 
sequently the resulting realization lies in the linear regime. For nearby data points where 
the shot noise is negligible, the WF leaves the signal almost untouched ^ WF ’Q L « A. In 
such a case the fitting formula would linearized the data, as has been checked against the 
N-body simulations. 


IV. Volume Limited IRAS Catalog 

The IRAS 1.2Jy catalog consists of 5321 galaxies These are used to evaluate the 
smoothed density field on a Cartesian grid of 1000 Km/s spacing within a 6000/im/s, 
excluding the zone of avoidance, yielding 844 data points. NLCRs are created on a finer 
64 3 grid of 250 Km/s spacing. A PM N-body code that is used here is based on an FFT 
Poisson solver. For a CDM-like power spectrum the structure within the 6000/im/s would 



be hardly affected by the periodicity on the ±8000/im/s box. A comprehensive analysis 
of NLCR, including detailed comparison of reconstruction of mock catalogs and sensitivity 
to the assumed prior has been conducted and will be presented in a forthcoming paper 
(Bistolas and Hoffman, 1995a) 

The IRAS galaxy distribution is presented in Fig. 1, where the projected galaxy 
distribution (within ±1000 Km/s) on the nine planes of SGX , SGY, SGZ = ±3000, 0 Km/s 
is given. The full N-body distribution of the NLCR is presented in Fig. 2 in a manner 
similar to that of Fig. 1. Note that this consists a realization of a ‘volume limited’ IRAS 
catalog. Finally, the non-linear reconstructed field at 500 Km/s smoothing is presented 
in Fig. 3. A full analysis of the cosmography revealed by the NLCR will be given in a 
forthcoming publication (Bistolas and Hoffman 1995b). Here we just point to the seemingly 
filamentry structure of the reconstruced galaxy distribution. A closer inspection shows that 
the generic feature here is more planar (2D) rather than a filamentry (ID) structure, and 
the apparent filaments are the intersection of the sheets with the planes defined by the 
plots. Different NLCRs have been performed to study the variance implied by the prior 
and the data and relatively small scatter is found between the different realizations. In 
particular, the existence and location of peaks and troughs is a very robust feature of the 
realization with small scatter in their amplitudes. Also the sheets and filaments remain 
invariant under the different realizations, however their ‘sharpness’ varies somewhat. A 
comparison of two such realizations is given in Fig. 4, where the ‘volume limited’ galaxy 
distribution and 500 Km/s smoothed 5-field at the supergalactic (SGZ = 0) plane are 
plotted. 


V. Discussion 

The NLCR algorithm presented here enables one to perform controlled Monte Carlo 
N-body simulations of the formation of our ‘local’ universe. These are designed to recover 
the actual observational data, used to constrain them, within the statistical uncertainties 
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of the data. The new ingredient introduced here is the reconstruction of the non-linear 
regime, i.e. the extrapolation in Fourier space from small to large wavenumbers that are 
deep in the non-linear regime. 

The NLCR introduced here can serve as a tool for studying and analyzing the large 
scale structure of the universe. Some of the obvious problems where NLCRs are expected 
to be very useful are: (1) The reconstruction of the velocity field from redshift catalogs; 
(2) Mapping the zone of avoidance and extrapolating the dynamical fields into unobserved 
regions; (3) Studying the dynamics of actually observed rich clusters with the actual initial 
and boundary conditions; (4) Analysis of filaments and pancakes as probes of the initial 
conditions and the cosmological model; (5) The NLCR can serve as a probe of the biasing 
mechanism. The main virtue here lies in the fact that different data sets, which in principal 
can represent different biasing of the underlying dynamical field, can be used to simulta¬ 
neously set constraints on the realizations. Given all these and the technical simplicity of 
the algorithm we expect it to be a standard tool of N-body and gas dynamical simulations. 

At the time this Letter has been written Kolatt et al. (1995) have reported on a similar 
project of NLCR of the IRAS 1.2Jy catalog. Their procedure differs from the present 
one mainly in not distinguishing between the low resolution (data) and high resolution 
(realizations). The input data is smoothed on the 500 Km/s scale and is heavily dominated 
by the noise, which is ‘removed’ by a power preserving modified WF. The modified filter 
is designed to preserve the power, regardless of the noise level. The resulting estimator 
is therefore more dominated by the noise and less by the prior model compared to our 
method. Yet, both methods seem to yield similar results and are equally efficient. Detailed 
comparisons against N-body simulations are needed to judge the merits of each method. 
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Figure Captions 


Fig. 1: Raw data: The IRAS 1.2Jy galaxies. The galaxy distribution is presented in 
9 planar slabs of thickness of ±10 h~ 1 Mpc. (Supergalactic coordinates are used and 
distances are given in h~ 1 Mpc , where h is Hubble’s constant in units of 100 Km/s/Mpc.) 

Fig. 2. ‘Volume limited’ IRAS catalog: A non-linear constrained realization based on the 
IRAS galaxy distribution. The full N-body particle distribution has been diluted to the 
mean IRAS galaxies mean number density. 

Fig. 3. Gaussian smoothing: The non-linear constrained realization shown in Fig. 2 is 
Gaussian smoothed on a scale R = 500 Km/s. Contour spacing is 0.2 and the dashed 
lines correspond to negative values of 5. 

Fig. 4. Different realizations: A comparison of different non-linear constrained realizations 
of the same input data and prior is presented here by the ‘galaxy’ distribution and the 
contour plots. Upper and lower raws correspond to two different realizations. The galaxy 
distribution and contour plots are presented in the same way as in Figs. 2 and 3. 
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